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Minimal deterministic finite automata (dfas) can be reduced further at the expense 
of a finite number of errors. Recently, such minimization algorithms have been im- 
proved to run in time 0(n log n), where n is the number of states of the input dpa, by 
[Gawrychowski and Jez: Hyper-minimisation made efficient. Proc. Mfcs, Lncs 5734, 
2009] and [Holzer and Maletti: An nlogn algorithm for hyper-minimizing a (mini- 
mized) deterministic automaton. Theor. Comput. Sci. 411, 2010]. Both algorithms return 
a DFA that is as small as possible, while only committing a finite number of errors. These 
algorithms are further improved to return a dfa that commits the least number of errors 
at the expense of an increased (quadratic) run-time. This solves an open problem of 
|Badr, Geffert, and Shipman: Hyper- minimizing minimized deterministic finite state 
automata. Rairo Theor. Inf. Appl. 43, 2009]. In addition, an experimental study on 
random automata is performed and the effects of the existing algorithms and the new 
algorithm are reported. 
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1. Introduction 

Deterministic finite automata (dfas) jl4j are used in a vast number of applications 
that require huge automata like speech processing [TT] or Hnguistic analysis [10]. 
To keep the operations efficient, minimal DFA are typically used in applications. A 
minimal DFA is such that all equivalent DFAs are larger, where the size is measured 
by the number of states. The asymptotically fastest minimization algorithm runs in 
time 0(n log n) and is due to Hopcroft 9J, where n is the size of the input DFA. 

*This is an extended and revised version of [A. Maletti: Better hyper-minimization — not as fast, 
but fewer errors. In Proc. CIAA, volume 6482 of LNCS, pages 201-210. Springer- Verlag, 2011]. 
^The work was carried out while the author was at the Departament de Filologies Romaniques, 
Universitat Rovira i Virgili (Tarragona, Spain) and was supported by the Ministerio de Educacion 
y Ciencia (MEC) grants JDCI-2007-760 and MTM-2007-63422. 
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2 A. Maletti and D. Quernheim 

Recently, stronger minimization procedures, called hyper-minimization, have 
been investigated |2I II 51 71 12) . They can efficiently compress minimal dfas even 
further at the expense of a finite number of errors. The fastest hyper-minimization 
algorithms [51 7] run in time O(nlogn). More specifically, given an input dfa M, a 
hyper- minimization algorithm returns a hyper-minimal DFA for M , which 

• recognizes the same language as M up to a finite number of errors, and 

• is minimal among all dfas with the former property (hyper-minimal). 

In this contribution, we extend a known hyper-minimization algorithm to return 
a hyper- optimal DFA for M , which is a hyper- minimal DFA for M that commits the 
least number of errors among all hyper-minimal DFAs for M. Moreover, the algo- 
rithm returns the number of committed errors, which allows a user to disregard the 
returned DFA if the number is unacceptably large. Our algorithm is based essen- 
tially on a syntactic characterization of hyper-minimal dfas for M (see Theorems 
3.8 and 3.9 of [2]). Roughly speaking, two hyper-minimal DFAs for M differ in ex- 
actly three aspects [2]: (i) the finality of the states P that are reachable by only 
finitely many strings, (ii) the transitions from states of P to states not in P, and 
(iii) the initial state. The characterization has two main uses: It allows us to com- 
pute the exact number of errors for each hyper- minimal DFA for Af , and it allows 
us to easily consider all hyper-minimal dfas for M in order to find a hyper-optimal 
DFA for M. We thus solve a remaining open problem of 2 . Unfortunately, the time 
complexity of the obtained algorithm is O(n^), and it remains an open problem 
whether the algorithm can be improved to run in time O(nlogn). 

Finally, we demonstrate hyper-minimization and the new algorithm on test dfas, 
which we generated from random non-deterministic finite automata |14l 13] . The 
difficult cases for minimization that were identified in TJF also prove to be difficult 
for hyper-minimization in the sense that only a small reduction is possible at the 
expense of a significant amount of errors. The new algorithm alleviates this problem 
by avoiding a large number of mistakes. Outside the hard instances of [13], already 
hyper-minimization reduces the size nicely at the expense of only a few errors. 

2. Preliminaries 

The set of integers is Z, and the subset of nonnegative integers is N. If the symmetric 
difference S AT ^ {S \T) U {T \ S) oi two sets S and T is finite, then S and T are 
almost-equal. Each finite set E is an alphabet, and the set of all strings over E is S*. 
The empty string is e, and the concatenation of two strings u,w e E* is denoted 
by the juxtaposition uv. The length of the string w = ai ■ ■ ■ ak with cti, . . . , (Tfe £ E 
is \w\ = k. A string u G E* is a prefix of w if there exists a string w G E* such that 
w = uv. Any subset L C E* is a language over E. 

A deterministic finite automaton (for short: dfa) is a tuple M — {Q, E, qq, S, F), 
in which Q is a finite set of states, E is an alphabet of input symbols, go 6 Q is 
an initial state, (5:(5xE— s^Qisa transition mapping, and C Q is a set of 
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final states. The transition mapping S extends to a mapping S'.QxT,* Q hy 
S{q,e) = q and 5{q,aw) = S{S{q,a),w) for every g G Q, cr G S, and w £ S*. For 
every g G Q, let 

L{M, q)^{weT.* \ S{qo, w) = q} and L{q, M) = {w G E* | 5{q, w) £ F} . 

Intuitively, L{M,q) contains all strings that take M (fi-om the initial state go) into 
the state q, and L{q,M) contains all strings that take M from q into a final state. 
Moreover, Ker(M) = {q G Q \ L{M, q) infinite} is the set of kernel states of M, and 
Pre(M) = Q\ Ker(M) is the set of preamble states. The sets Ker(M) and Pre(M) 
can be computed in time 0(m), where m ~ \Q x Yl\. The DFA M recognizes the 
language L{M) = L{qo, M) = IJ^^p L{M, q). 

An equivalence relation = C 5 x S* is a reflexive, symmetric, and transitive binary 
relation. The equivalence class of an element s G 5' is [s]= = {s' G 5' | s = s'} and 
[5*]= = {[s]= I s G S}. A weak partition of S* is a set 11 such that (i) A C S* for 
every yl G 11, (ii) Ai n A2 = for all different ^1,^2 G 11, and (in) S = IJ^g^ A. 
An equivalence relation = C Q x Q on the states of the DFA M = {Q, E, qo, S, F) is 
a congruence relation on M if S(qi, a) = S{q2, <t) for all qi = q2 and tr G E. 

Let M — {Q,T,,qa,6,F) and N — {P,Y.,po, fj,,G) be two dfas. A mapping 
h: (5 —7- P is a transition homomorphism if h{5{q, a)) ~ ii{h{q), a) for every q G Q 
and (T G E. If additionally q € F ii and only if h{q) G G for every q € Q, then h is a 
(dfa) homomorphism. In both cases, h is an isomorphism if it is bijective. Finally, 
we say that the DFAs M and N are (transition and dfa) isomorphic if there exists 
a (transition and DFA, respectively) isomorphism h: Q ^ P. 

The DFAs Ad and N are equivalent if L{M) ~ LiN). Clearly, (dfa) isomorphic 
DFAs are equivalent. Two states q £ Q and p E P are equivalent, denoted hy q = p, if 
L(q, M) = L{p, iV)EI The equivalence = C Q x Q is a congruence relation on M. The 
DFA M is minimal if it does not have equivalent states (i.e., qi = (72 implies qi = q2 
for all (71,92 £ Q)- The name 'minimal' is justified by the fact that there does not 
exist a dfa with strictly fewer states that recognizes the same language as a minimal 
DFA. A minimal dfa that is equivalent to M can be computed efficiently using 
Hopcroft's algorithm 8 , which runs in time 0{m\ogn) where m = jQ x E| and 
n ~ \Q\. Moreover, minimal DFAs are equivalent if and only if they are isomorphic. 

Similarly, the DFAs M and N are almost-equivalent if L{M) and L{N) are 
almost-equal. The states q € Q andp G P are almost-equivalent, which is denoted by 
q ^ p, ii L{q, M) and L{p, M) are almost-equal. The almost-equivalence ~ C Q x Q 
is also a congruence. The minimal dfa M is hyper-minimal if it does not have 
a pair ((71,92) G Q x Q oi different, but almost-equivalent states such that 
{11^12} n Pre(Af) ^ 0. Again, the name 'hyper-minimal' is justified by the fact 
that there does not exist a dfa with strictly fewer states that recognizes an almost- 

While it might not be clear from the notation q = p to which dfa a state belongs, it will typically 
be clear from the context. In particular, we might have M = N; i.e., we might relate two states 
from the same dfa. 
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Algorithm 1 Structure of a hyper-minimization algorithm. 
Require: a dfa M — {Q, S, go, (5, F) with m — jQ x S| and n — \Q\ 

M 4- Minimize(M) // Hopcroft's algorithm; O(mlogn) 

2: ^ CompAEquiv(M) // Compute almost-equivalence; O(mlogn) 

M MergeStates(M, Ker(M), ~) // merge almost-equivalent states; 0(m) 
4: return M 



equivalent language (see Theorem 3.4 of [2 ). A hyper-minimal dfa that is almost- 
equivalent to M is called "hyper-minimal for Af" and can be computed efficiently 
using the algorithms of |5| 7j . which also run in time 0(m log n). A structural char- 
acterization of hyper-minimal DFAs is presented in Theorems 3.8 and 3.9 of [2], 
which we reproduce here. 

Theorem 1 (see [2]) Let M — {Q, S, qo, 6, F) and N = {P, S,po, G) be almost- 
equivalent DFAs. Then 5{qo,'w) ^ ii{po,w) for every w G T,* . In addition, if 
M and N are hyper-minimal, then there exists a mapping h: Q ^ P such that 

• q ^ h{q) for every q E Q, 

• h yields a transition isomorphism between Pre(M) and Pre(A), and 

• h yields a DFA isomorphism between Ker(M) and Ker(7V). 

3. Hyper-minimization 

Hyper-minimization as introduced in [2] is a form of lossy compression with the 
goal of reducing the size of a minimal DFA at the expense of a finite number of 
errors. More formally, hyper-minimization aims to find a hyper-minimal DFA for an 
input DFA. Several hyper-minimization algorithms exist [2| 1| 5| 7] . and the overall 
structure of the hyper-minimization algorithm of Tl is displayed in Algorithm[T] For 
the following discussion let M = {Q, E, go, <5, F) be a dfa, and let m — \Q x J^l and 
n — \Q\ be the number of its transitions and the number of its states, respectively. 

The most interesting component of Algorithm [1] is the merging process. In gen- 
eral, the merge of a state p E Q into another state q € Q redirects all incoming 
transitions of p to q. If p = go then g is the new initial state. The finality of q is 
not changed even if p is final. Clearly, the state p can be deleted after the merge if 
p 9^ g. Formally, uieigeMip q) = {P, E,po, F), where P = {Q\ {p}) U {g} and 
for every q' G Q and tr G S 

fg ifgo=?5 , fg if (5(g', tr) = p 

Pq — < and fi(q , cr) = < 

[go otherwise \^S{q',a) otherwise. 

Lemma 2. Let p,q E Q and N = merge ]y,j(p — > g). Then 



L{M)AL{N) = {uw\ue L{M,p),w G L{p, M) A L{q, M)} 
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Consequently, M and merge — !■ q) are almost-equivalent if (7 ^ p and 
p G Pre(Af). The hyper-minimization algorithms of [21 II 51 7] only perform such 
merges. More precisely, the procedure MergeStates merges almost-equivalent 
states in the mentioned fashion until the obtained DFA is hyper-minimal. The 
number of errors introduced in this way differs among several hyper-minimal DFA 
for M and depends on the merges performed. In this contribution, we develop an 
algorithm that computes a hyper-minimal DFA for M that commits the minimal 
number of errors among all hyper-minimal DFAs for M . A DFA N is hyper- optimal 
for M if it is hyper-minimal and the cardinality of the symmetric difference be- 
tween L{M) and L{N) is minimal among all hyper-minimal DFAs. Note that a 
hyper-optimal DFA for M is hyper-minimal for M. Moreover, our algorithm returns 
the exact number of errors, and we could also return a compact representation of 
the actual error strings. Overall, we thus solve a problem that remained open in [2]. 

An extreme example is presented in Fig. [1] If we run the hyper-minimization 
algorithms of |2| 1| 5| 7j . then we obtain one of the two first DFAs of Fig. [2l Both of 
them commit 2+\L{Mi) A L{M2)\ errors. If we let L{Mi) = Y,^ for some fc e N and 
L{M2) = 0, then they commit 2 + errors. On the other hand, the optimal DFA 
is the third DFA of Fig. [21 and it commits only 2 errors (irrespective of Mi and M2). 
This shows that the gap in the number of errors can be very significant. 

4. Computing the number of errors 

Next, we show how to efficiently compute the number of errors that are caused by 
a single merge (see Lemma [2]). For this we first compute the size of the difference 
between almost-equivalent states p ^ q. From now on, let M = {Q, E, go, S, F) be a 
minimal DFA. In our examples, we will always refer to our running example DFA Afcx, 
which is presented in Fig. |3l Its kernel states are Ker(Mcx) — {E, F, I, J, K, L, M} 
and the following partition represents its almost-equivalence: 



{0} {A} {B} {CD} {E} {F} {G,HJ,J} {K,L,M}. 
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Fig. 2. Three hyper-minimal dfas for the dfa of Fig.[T] where unbroken hnes are a-transitions and 
dashed lines are fe-transitions. 




Fig. 3. Example dfa A/ox (left) and optimal hyper-minimal dfa A^ex (right) for Mex, where un- 
broken lines are a-transitions and dashed lines are fe-transitions. 

In comparison to the DFA Mex of Fig. [31 the dfa N^x of Fig. [3] commits the foUowing 
seven errors: {aaaab, aaab, aab, aabab, aabb, abab, abb}. Note that existing algorithms 
will only find hyper-minimal DFAs that commit 16 errors, and the worst hyper- 
minimal DFA for A/ox commits 29 errors. 

Definition 3. For every q ^ p, let 

{0 ifq=p 
Z;aeE-E^5(9,<T),5(p,<T) + < ^, . otherwise. 
I 1 otherwise 

Lemma 4. Eq^p = \L{q, M) A L{p, M)\ for every q ^ p. 

Proof. Let q ^ p. Then |L((7, M) A L(p, M)| is finite by definition, and we let 
kq^p = max {\w\ \ w £ L(q, M) A L{p, M)}^ where max = — cxd. Now, we prove 
the statement by induction on N U {— cx)}. First, suppose that kq,p = —cxd. Then 
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Algorithm 2 CompE: Compute the error matrix E. 
Require: minimal dfa AI — {Q, E, qq, 5, F) and states q ^ p 
Global: error matrix E e if^^'^ initially on the diagonal and —1 elsewhere 
if Eq^p =1 then 

2: c ^ (((7 G F) xor (p S F)) / / set errors to 1 if 5 and p differ on finality 
Fg,p ^ c + ^ CompE(M, ct), ct)) // recursive calls 

4: return Fq_p / / return the computed value 



L{q,M) = L{p,M), which yields that q = p. Since M is minimal, we conclude 
that q — p and Eq p — 0, which proves the induction base. Second, suppose that 
kq,p > 0, and let W = {aw | cr e G L{Siq,a),M) AL{S{p,cr),M)}. Obviously, 
W C L{q,M) AL{p,M) <Z W U {e} and fc5(,,<T),5(p,<T) < kq,p every cr e S. The 
empty string e is in F(q, M) A M) if and only if q and p differ on finality. More- 
over, Es(^q^cr),S(p,a) = \L{5{q, a) , M) A L{S {p, d) , M)\ for every cr £ S by induction 
hypothesis. Since kq^p > 0, we have q ^ p and 



iiqe F p e F 

otherwise. 



which proves the induction step and the statement. □ 

Let us illustrate Algorithm [2] on the example DFA Mex of Fig. [3l We list some 
error matrix entries together with the corresponding error strings. Note that the 
error strings are not computed by the algorithm, but are presented for illustrative 
purposes only. 

Eg.h ~ 5 {e,a,aa,ab,b} Eh,i—4: {e,a,aa,ab} Ek,l = 3 {e,a,b} 
Egj^I W Fh„; = 1 {e} Ek,m^2 {a,b} 

Eqj — A {a,aa,ab,b} Ejj = 3 {a,aa,ab} F^.a/ = 1 {s} 

Theorem 5. Algorithm can be used to compute all Eq p with q p in 
time 0{mn). 



Proof. Clearly, the initialization and the recursion for Eq^p are straightforward 
implementations of its definition (see Definition [3]) . Moreover, each individual call 
takes only time 0(|S|) besides the time taken for the recursive calls. Since each call 
computes one entry in the matrix and no entry is ever recomputed, we obtain the 
time complexity 0(|S| • n^) = 0{mn) because m — \Y}\ ■ n. □ 

In addition, we need to compute the number of strings that lead to a preamble 
state (see Lemma [2]) . This can easily be achieved with a folklore algorithm (see 
Algorithm [3] and Lemma 4 of [4 ) that computes the number of paths from qo to 
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Algorithm 3 Comp ACCESS: Compute the number of paths to a preamble state. 
Require: a minimal dfa M = (Q, S, q^, S, F) and a preamble state q E Pre(M) 
Global: access path vector w £ N*^ initially 1 at go and elsewhere 
if = then 

2: Wq 'It- ^ CompAccess(M,p) // recursive calls 




each preamble state. Mind that the graph of the dfa M restricted to its preamble 
states Pre(M) is acyclic. Overall, the algorithm is very similar to Algorithm [21 but 
we will not present a formal comparison here. 

Theorem 6 (see [4j) Algorithm\^ can be used to compute the number of paths to 
each preamble state in time 0{m). 

Proof. The correctness is obvious using the observation that p is a preamble state 
for every {p,a) G 5~^{q) with q G Pre(M). Clearly, the call CompAccess(M, g) 
terminates in constant time if the value Wq has already been computed. Moreover, 
each transition can be considered at most once in the sum in line 2, which yields 
the time complexity 0{m). □ 

Algorithm [3] computes the following values for the dfa Mcx of Fig.[3l 



Overall, we can now efficiently compute the number of errors (or a representation 
of the errors itself) caused by a single merge operation. However, multiple merges 
may affect each other. An error that is introduced by one merge might be removed 
by a subsequent merge, so that we cannot simply obtain the exact error count by 
adding the error counts for all performed merges. 

5. Optimal state merging 

The previous section suggests how to compute a hyper-optimal DFA for a given 
minimal DFA M = {Q,T,,qo,6, F) with m — \Q x T,] and n = \Q\. We can simply 
compute the exact set of errors for each hyper-minimal DFA for M and select a 
DFA with a minimal error count. By Theorem [1] we can easily enumerate all hyper- 
minimal DFAs for M, so that the above procedure would be effective. However, in 
this section, we show that we can also obtain a hyper-optimal dfa using only local 
decisions. This is possible since the structural differences among hyper-minimal DFAs 
for M mentioned in Theorem [T] cause different errors. Roughly speaking, Theorem[l] 
shows that two hyper-minimal DFAs for M can only differ on 




{p,a)e5-Hq} 



return Wq 



1 1 return the computed value 



Wo = "^A ~ Wb ~ Wd ~ 1 Wc ~ 2 Wq = 3 Wh ~ 6 . 



• the initial state, 
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• finality of preamble states, and 

• transitions from preamble to kernel states. 

Now, let us identify the strings and potential errors associated with each of 
the three differences. Recall that ~ is the almost-equivalence relating the states 
of M. To simplify the following discussion, we introduce some additional notation. 
For every q € Q, let Kg = {p & Ker(M) | p ~ q'}. In other words, the set Kq 
contains all kernel states that are almost-equivalent to the state q. Moreover, let 
= {B e [Q]^ I C Pre(M)} be the set of blocks of almost-equivalent and 
exclusively preamble states. Now we define sets of strings that correspond to the 
three types of differences mentioned above: 

. LetH^o = U,ex,„S*- 

• Let Wb = [jgeB ^M, q) for every B € P^. 

• For every B e P^ and cr G S with U^gb ^S(q,<y) ^' 

Wbm = {u(Jw I u e Wb,w G E*} . 
Lemma 7. The following is a weak partition of T,* : 

{Wo} U{WB\Be P^} U {WB,a I B G P^, a G E, y i^^f,,,) ^ 0} . 

Proof. Clearly, Wq — S* if Kq^ 7^ or Wq = otherwise. Suppose the former; 
i.e., there exists q G Kg^. Let p G Pre(M) be a preamble state. Since M is minimal, 
there exists a string w G L{M,p). Moreover, p = S{qQ,w) ~ S{q,w) because go ~ (Z 
and ^ is a congruence. Clearly, S{q, w) is a kernel state due to the fact that q is a 
kernel state. Consequently, every preamble state p G Prc(A'/) is almost-equivalent 
to some kernel state, which proves that [p]^ ^ Pr^ for every p G Pre(M). This yields 
that the statement is correct if Kg^ ^ 0. 

In the second case, let Kg„ = 0. Then Wq = 0. Clearly, Wb^ n Wb^ = for all 
different -Bi,i?2 G P^ because {L(M, g) | g G Q} is a partition of E*. Using the 
same reasoning, we can show that Wbi and Ws^^a are disjoint for all Bi,B2 G P^ 
and suitable cr G E using the additional observation that Kg^g^nj) 7^ for every 
w G WB2,a, whereas ^^5(^0,^)) = for every w G Wbi- Finally, let -81,-62 G Pr^ 
and suitable ai,a2 G E. Suppose that there exists w G Wbi,<ji H W^B2,o-2- When 
processing w hy M there can only be one transition from a preamble state to a 
kernel state, which in both cases has to be achieved by the letter (Ti = <T2 - Moreover, 
the state before taking this transition is unique, which yields that also Bi = .62. 
Consequently, we have shown that all sets are disjoint. 

It remains to prove that all of E* is covered. Let w G E* be an arbitrary string. 
If Ks{qa,w) = 0, then w G W\s{qa,w)]r.- On the other hand, let Ks_(g^^„) ^ 0. Then 
there exists a prefix u of w such that -f^5(qo,„) and /^^(gjj „) = for all strict 
prefixes v of u. Then w G Wo if u = £ and w G W^(^g^^^y^^^„ where u = vg and 
a G E. This concludes the proof. □ 
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The previous lemma shows that error strings in the mentioned sets are inde- 
pendent and cover all potential errors. For our example dfa Mex of Fig. [3] we have 

Wo = W{c,D} = {aaa,aah,ab} W{c,d}m = {uaw \ u e W{c,d},w G E*} . 

Next we address all individual differences between hyper-minimal dfas for M . We 
start with the initial state. 

Lemma 8. If Kq^ ^ 0, then each hyper-minimal DFA for M is obtained by pruning 
merge (go ^ q) for some q G Kg^. Moreover, it commits exactly Eq„^q errors. 

Proof. Let N — {P,I],po, fi,G) be a hyper-minimal dfa for M. By Theorem [U 
the DFA N consists of only kernel states and is isomorphic to the subautoma- 
ton of M that is determined by Ker(Af). Moreover, qq ^ po, which yields that 
TV is isomorphic to merge^j(qo <z) for some q e Kqo- -By Lemma [2] we 
have that L{q,M) ^ L{q,N) = L{N) and L{M) = L{qo,M). This yields that 
L{M)AL{N) = L{qo,M) AL{q,M), of which the size is Eq^^q by LemmaH □ 

We can compute the number Eq^^q of errors caused by the merge of qo into an 
almost-equivalent kernel state q £ Kq^ using Algorithm [2] of Sectional This simple 
test is implemented in lines 1-2 of Algorithm [5l 

Second, let us consider a block B £ Pr^ of almost-equivalent preamble states. 
Such a block must eventually be merged into a single preamble state p in the hyper- 
minimal DFA TV, for which we need to determine finality because the preamble states 
of two hyper-minimal DFAs for M are only related by a transition isomorphism (see 
Theorem [1]). 

Lemma 9. Let B G and N ~ (P, ^,po, Mi be a hyper-minimal DFA for M. 
Then N commits either '^^^g^p Wq or^^^g^^pWq errors ofWs- 

Proof. The set Wb contains all strings that take the DFA AI into some state of B. 
Moreover, all those strings take the hyper-minimal DFA iV into a single state p £ P; 
i.e., L{N,p) = Wb by Theorem [H Let 

W'b^ {w£Wb\w£ L{M)} and Wb^{w £Wb\w ^ L{M)} ; 

i.e., the partition into accepted and rejected strings (by M) of Wb, respectively. 
Consequently, it is sufficient to compare the size of those sets because if p G G (i.e., 
p is a final state of N), then all strings of W^ are errors. This is due to the fact 
that they are rejected by M, but accepted by TV. On the other hand, the strings 
of W'q are errors if p is non-final. Finally 

\W'b\ = \{w£Wb \ q£F,w £L{M,q)}\ 

= \{w £^* \q £ BnF,w £ L{M,q)}\^ ^ Wq, 

q&BnF 
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Algorithm 4 CompFinality: Determine finality of a block of preamble states. 
Require: a minimal dfa M — {Q, E, qq, S, F) and a block B e 
Global: error count e 

(/j/) ( Wg, // errors for non-final and final state 

qeBnF qeB\F 

2: e •(— e + min(/, /) // add smaller value to global error count 

select q € B such that q £ F if f > f / / select appropriate state 

4: return q / / return selected state 



and similarly, \Wg\ = Y.qeB\F ^9- n 

Consequently, if and only if more strings are accepting (i.e., \Wg\ > \W'^\), 
then the preamble state p £ P oi N should be accepting. This decision is codified 
in Algorithm m On our example DFA Mex of Fig. [3] and the block B — {C,D} it 
compares Wg — {aaa,ab} and Wg = {aab}, and thus decides that the state C of 
the DFA A^ox of Fig. [3] should be final. Note that Lemma [7] shows that the errors 
are distinct for different blocks Bi and i?2- AH of the following algorithms will use 
the global variable e, which will keep track of the number of errors. Initially, it will 
be set to and each discovered error will increase it. Finally, we assume that the 
vector w £ N*^ (see Algorithm[3]) and the error matrix E £ Z'^^'^ (see Algorithmic]) 
have already been computed and can be accessed in constant time. 

Lemma 10. ComputeFinality(M, _B, w) adds the smallest number of errors 
of Wb committed by a hyper-minimal DFA N for M. It runs in time 0{\B\) and 
returns a final state (of M) if and only ifWs ^ L{N). 

Proof. Algorithm [3] implements the method of Lemma [S] in the given run-time. □ 

For the third criterion, let us again consider a block B G of almost- 
equivalent preamble states and a symbol cr G S such that Uqes -^s{q,a) 7^ 0- Clearly, 
Ks(q^.(j) — Ks{q2.cr) 91 1 92 G B bccausc ~ is a congruence on M. We need to 

determine the kernel state that will be the new transition target. By Theorem [T] it 
has to be a kernel state because 5{q, a) is almost-equivalent to a kernel state. 

Lemma 11. Let N — (P,T,,pq, fi,G) be a hyper-minimal DFA for M, and let 
B G P^ and a E T, be such that K = Uges f^siq.a) 7^ 0- Then the DFA N commits 
J2qeB^q • Es{q,a),q' crrors ofV/sM for some q' G K. 

Proof. Since WB,a = {uav \ u G Wb,v G S*}, each string w G Wb.ct has a 
prefix ua with u G Wb- Clearly, each u G Wb takes the DFA M into some state 
of B, and the hyper-minimal DFA N into a state state p £ P such that L{N, p) — Wb 
by Theorem[T] Moreover, /i(p, a) = p' for some p' G Ker(A^), for which an equivalent 
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state q' ^ Q exists in M by Theorem [T] because the kernels of M and N are dfa 
isomorphic. Consequently, L{p' , N) = L{q' , M) and N accepts the strings 

{uav I u e Wb,v e L{q', M)} C Ws^a 

and rejects the remaining strings of Wb.ct- On the other hand, the dfa M accepts 
the strings Uqesi^''"'' I " ^ L{M,q),v G L((5(q, cr), Af)} C WB,a and rejects the 
remaining strings of Wb,it- Clearly, d{q, a) ~ q'. Consequently, the errors are exactly 
U,gs{wcrw I u G L{M,q),v £ L{6{q,a), M) A L{q' , M)} C Wb,^, which yields the 
J2q£B ^9 ■ ^s(q,a),q' crrors of Wb,ij because the decomposition is unique. □ 

Recall that Wq and Eq^p have been pre-computed already. Next, we discuss the 
full merging algorithm (see Algorithm [5]) . The initial state is handled in lines 1-2. 
In lines 5-7 we first handle the already discussed decision for the finality of blocks B 
of preamble states and perform the best merge into state q. In lines 8-11 we de- 
termine the best target state for all transitions from a preamble to a kernel state. 
The smallest error count is added to the global error count in line 10 and the cor- 
responding designated kernel state is selected as the new target of the transition 
in line 11. This makes all preamble states that are almost-equivalent to this kernel 
state unreachable, so they can be removed. On our example DFA Mox of Fig. [31 we 
have that 5(C, a) = G is a transition from the block {C, D} £ to a kernel state. 
Consequently, we compare X]ge{c D} ' ^S(q,a),q' for all kernel states q' G Kq: 

Wq-Es^q^^) J =2-l + l-l = i and ^ Wq-E^i^q^^-^^j ^ 2-A+l A = \2 . 

qe{C,D} qe{C,D} 

Theorem 12. Algorithm\^ runs in time 0{mn) and returns a hyper- optimal dfa 
for M . In addition, the number of committed errors is returned. 

Proof. The time complexity is easy to check, so we leave it as an exercise. Since 
the choices (finality, transition target, initial state) are independent by Lemma [71 
all hyper- minimal DFAs for M are considered in Algorithm [5] by Theorem [1] Conse- 
quently, we can always select the local optimum for each choice (using Lemmata [SI 
[HI and llip to obtain a global optimum, which proves that the returned number is 
the minimal number of errors among all hyper-minimal DFAs. Mind that the number 
of errors would be infinite for a hyper-minimal DFA that is not almost-equivalent 
to M . Moreover, it is obviously the number of errors committed by the returned 
DFA, which proves that the returned DFA is hyper-optimal for M . □ 

Corollary 13 (of Theorem I12p For every DFA M we can obtain a hyper-optimal 
DFA for M in time 0{mn). 

6. Empirical results 

In order to evaluate the algorithm, we compare it to another hyper-minimization 
algorithm that does not aim for low error profile. Since the algorithm of [7] is 
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Algorithm 5 OptMerge: Optimal merging of almost-equivalent states. 
Require: a minimal dfa AI = (Q, S, qq, S, F) and its almost-equivalent states 
Global: error count e; initially 

if Kg„ ^ then 

2: return ((Q, S, argmin^gj^^^ 5, i^), min^eK^^ 

N ^ M where N = (P, S,po, G) 
4: for all B £ Pr^ do 

q ^ CompFinality(M, B) 
6: for all p e S do 

N ^ merge^(p q) 

8: for all (7 G S do 

if = ^ then 



// initialize output dfa 
// determine finality of merged state 
/ / perform the merges 



10: 



e e -|- mm 



■E, 



5{p,a-).,q 



peB 



n{q,(T) <- argminf ■iWp • Es(^p,a),q 



1 1 add best error count 
/ / update follow state 



12: return (A'', e) 



("don't-care") non-deterministic (in the selection of merge targets), we implemented 
a simple stack discipline, which always pops the first element. For a varying set of 
parameters, 100 random dfas have been generated and run through both algo- 
rithms. The number of saved states as well as the number of errors are reported. 
First we explain how the test DFAs were generated, describe the experimental setup, 
and then present and discuss the results. 

We use an algorithm based on the original algorithm in Hanneforth's 
FSM<2.0> library [6], which generates random non-deterministic finite automata. 
This model is closely related to Karp's model of random directed graphs (see 
Chapter 2 of [3] or [13] for a discussion of different models). The only difference is 
the introduction of an additional parameter: the cyclicity a. The complete set of 
parameters is as follows: 

\Q\ This integer limits the number of states in the non-deterministic automaton. 

This integer coincides with the number of alphabet symbols. 
ds Uniform probability determining whether a given transition p q exists; 

we call ds ■ \Q\ the transition density. 
dp Uniform probability for a given state to be final. 
a This real- valued parameter < a < 1 controls the cyclicity by constraining 

"backward-pointing" transitions. In particular, if a = 0, then the automaton 

will be acyclic, and if a = 1. then all transitions are equally probable. 

A non-deterministic automaton M is generated in the following way: (i) The set of 
states is Q — {0,1,2,..., \Q\ — 1} with initial state 0. (ii) A state q £ Q is final if 
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Fig. 4. Hyper- minimization performance for non-deterministic automata with 30 states, |E| = 2 
and 0.3 < dp < 0.7. "Density" refers to ds ■ \Q\. Left: Average size of the minimal dfa. Right: 
Ratio of states saved by hyper-minimization. Values range over the full scale; i.e., they approach 
outside the ridge and inside the valley. 

and only if fq < dp, where < < 1 is a random value, (iii) Finally, for every 
(g, a,p) 6 Q X S X Q, we generate a random number < f[q^a,p) 1- The transition 
q p is present in M if and only if 



The latter case corresponds to "backward-pointing" transitions and creates cycles. 

For each set of parameters, we have generated 100 DFAs. These dfas were ob- 
tained by determinizing and minimizing the randomly generated non-deterministic 
test automata. All dfas have then been hyper-minimized, and the optimal hyper- 
minimal DFAs have been compared to the ones resulting from nai've hyper- 
minimization]^ The obtained results are shown in Figs. 2] and [5j 

Figure 0] shows the size of the minimal dfas and the potential of saving states 
by hyper-minimization. The left graph in Fig. |3] shows a ridge, which corresponds 
to cases in which dfa minimization is hard and results in a large minimal DFA [T3] . 
It is located around a transition density of ds • \Q\ = 1.25 for a cyclicity of 1, and it 
moves to higher densities for less cyclic automata. Essentially, the same ridge was 
observed by [13] (for the case a = 1). The right graph in Fig.[4]shows that these hard 
instances for DFA minimization are also hard for hyper-minimization in the sense 
that only very few states can be saved. However, for the remaining instances a 
considerable reduction in the number of states is achievable by hyper-minimization. 

If we focus on the contribution of this paper, then we find that the number of 
errors can be considerably reduced. Figure [5] shows the absolute number of errors for 
hyper-minimal DFAs (left graph) and the ratio of errors avoided by the hyper-optimal 
automaton (right graph). The absolute number of errors for the hard instances, 
which can only be reduced a little, is higher than for the easy instances. However, 

''The complete C^"+ source code will be made available, and the FSM<2.0> library is available 
at |http : //tagh . de/tom/?p=1737 1 





ds if p> q 
a ■ ds otherwise. 
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Fig. 5. Hyper-optimization performance. Left: Absolute number of errors in naive hyper-minimal 
DFAs. Right: Ratio of errors avoided by hyper-optimization. 



the hyper-optimal DFAs avoid a higher ratio of errors for the hard instances, which 
dramatically reduces the number of committed errors paid for the small reduction. 
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